A Mostly Data-Driven Approach to Inverse Text Normalization

نویسندگان

  • Ernest Pusateri
  • Bharat Ram Ambati
  • Elizabeth Brooks
  • Ondrej Plátek
  • Donald McAllaster
  • Venki Nagesha
چکیده

For an automatic speech recognition system to produce sensibly formatted, readable output, the spoken-form token sequence produced by the core speech recognizer must be converted to a written-form string. This process is known as inverse text normalization (ITN). Here we present a mostly data-driven ITN system that leverages a set of simple rules and a few handcrafted grammars to cast ITN as a labeling problem. To this labeling problem, we apply a compact bi-directional LSTM. We show that the approach performs well using practical amounts of training data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A language-modeling approach to inverse text normalization and data cleanup for multimodal voice search applications

In this paper we address two related challenges in multimodal local search applications on mobile devices: first, correctly displaying the business names, and second, harvesting language model training data from an inconsistently labeled corpus. We investigate the impact of common text normalization and the quality of language model training corpus on the accuracy of displayed results. We propo...

متن کامل

Developing EOP materials for Pre-service Cabin Crew: A text-driven approach

One prominent criterion to achieve efficient learning and instruction in an educational setting is the appropriate material(s) specifically developed for that particular group of learners, particularly in an English for Occupational Purposes (EOP) context. This study aimed at developing new EOP materials for pre-service cabin crew in an aviation school. To do so, initially the researchers perfo...

متن کامل

A Sociolinguistic Scrutiny of the Great Gatsby and its Persian Translation in Light of Hatim and Mason’s Framework

Translation studies essentially deals with a socio-communicatively driven and contextualized enterprise. Viewed hence, it seems that no discipline tends to provide the possibility of studying the interrelations between interlocutors to generate meaning within the interactive social context as precisely as sociolinguistics (Federici, 2018). A sociolinguistic approach to translation seems to be i...

متن کامل

Extracting Temporal Information from Open Domain Text: A Comparative Exploration

The utility of data-driven techniques in the end-to-end problem of temporal information extraction is unclear. Recognition of temporal expressions yields readily to machine learning, but normalization seems to call for a rule-based approach. We explore two aspects of the (potential) utility of data-driven methods in the temporal information extraction task. First, we look at whether improving r...

متن کامل

The Calculation of the output price vectorby applying reverse linear programming: The novel approach in DEA

In the today’s world wherein every routine is based on economic factors, there is no doubt that theoretical sciences are driven by their capabilities and affordances in terms of economy. As a mathematical tool, data envelopment analysis (DEA) is provided to economics, so that one can investigate associated costs, prices and revenues of economic units. Data Envelopment Analysis (DEA) is a linear...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017